\newpage

\definecolor{dkgreen}{rgb}{0,0.6,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\definecolor{mauve}{rgb}{0.58,0,0.82}

\lstset{ %
	language=Matlab,                % the language of the code
	basicstyle=\footnotesize,           % the size of the fonts that are used for the code
	numbers=left,                   % where to put the line-numbers
	numberstyle=\tiny\color{gray},  % the style that is used for the line-numbers
	stepnumber=2,                   % the step between two line-numbers. If it's 1, each line 
	% will be numbered
	numbersep=5pt,                  % how far the line-numbers are from the code
	backgroundcolor=\color{white},      % choose the background color. You must add \usepackage{color}
	showspaces=false,               % show spaces adding particular underscores
	showstringspaces=false,         % underline spaces within strings
	showtabs=false,                 % show tabs within strings adding particular underscores
	frame=single,                   % adds a frame around the code
	rulecolor=\color{black},        % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. commens (green here))
	tabsize=2,                      % sets default tabsize to 2 spaces
	captionpos=b,                   % sets the caption-position to bottom
	breaklines=true,                % sets automatic line breaking
	breakatwhitespace=false,        % sets if automatic breaks should only happen at whitespace
	title=\lstname,                   % show the filename of files included with \lstinputlisting;
	% also try caption instead of title
	keywordstyle=\color{blue},          % keyword style
	commentstyle=\color{dkgreen},       % comment style
	stringstyle=\color{mauve},         % string literal style
	escapeinside={\%*}{*)},            % if you want to add LaTeX within your code
	morekeywords={*,...}               % if you want to add more keywords to the set
}

\section{上手实践}

以上对迁移学习基本方法的介绍还只是停留在算法的阶段。对于初学者来说，不仅需要掌握基础的算法知识，更重要的是，需要在实验中发现问题。本章的目的是为初学者提供一个迁移学习上手实践的介绍。通过一步步编写代码、下载数据，完成迁移学习任务。在本部分，我们以迁移学习中最为流行的图像分类为实验对象，在流行的Office+Caltech10数据集上完成。

迁移学习方法主要包括：传统的非深度迁移、深度网络的finetune、深度网络自适应、以及深度对抗网络的迁移。教程的目的是抛砖引玉，帮助初学者快速入门。由于网络上已有成型的深度网络的finetune、深度网络自适应、以及深度对抗网络的迁移教程，因此我们不再叙述这些方法，只在这里介绍非深度方法的教程。其他三种方法的地址分别是：

\begin{itemize}
	\item 深度网络的finetune：\href{https://cosx.org/2017/10/transfer-learning/}{深度迁移学习识花训练(Python+Tensorflow)}、\href{https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb}{使用PyTorch进行finetune}
	\item 深度网络的自适应：\href{https://github.com/tensorflow/models/tree/master/research/domain_adaptation}{DSN方法}
	\item 深度对抗网络迁移：\href{https://github.com/erictzeng/adda}{ADDA方法(Tensorflow)}以及\href{https://github.com/jindongwang/tf-dann}{DANN方法(Tensorflow)}
\end{itemize}

%\subsection{非深度迁移}

在众多的非深度迁移学习方法中，我们选择发表于ICCV-13的JDA(Joint Adaptation Network)~\cite{long2013transfer}方法进行实践。实验平台为普通机器上的Matlab软件。

\textbf{1. 数据获取}

由于我们要测试非深度方法，因此，选择SURF特征文件作为算法的输入。SURF特征文件可以从网络上~\footnote{\url{https://pan.baidu.com/s/1bp4g7Av}}下载。下载到的文件主要包含4个.mat文件：Caltech.mat, amazon.mat, webcam.mat, dslr.mat。它们恰巧对应4个不同的领域。彼此之间两两一组，就是一个迁移学习任务。每个数据文件包含两个部分：fts为800维的特征，labels为对应的标注。在测试中，我们选择由Caltech.mat作为源域，由amazon.mat作为目标域。Office+Caltech10数据集的介绍可以在本手册的第~\ref{sec-dataset}部分找到。

我们对数据进行加载并做简单的归一化，将最后的数据存入$X_s,Y_s,X_t,Y_t$这四个变量中。这四个变量分别对应源域的特征和标注、以及目标域的特征和标注。代码如下：

\begin{lstlisting}[title=Matlab加载数据, frame=shadowbox]
load('Caltech.mat');     % source domain
fts = fts ./ repmat(sum(fts,2),1,size(fts,2)); 
Xs = zscore(fts,1);    clear fts
Ys = labels;           clear labels

load('amazon.mat');    % target domain
fts = fts ./ repmat(sum(fts,2),1,size(fts,2)); 
Xt = zscore(fts,1);     clear fts
Yt = labels;            clear labels
\end{lstlisting}

\textbf{2. 算法精炼}

JDA主要进行边缘分布和条件分布的自适应。通过整理化简，JDA最终的求解目标是：

\begin{equation}
\label{equ-eigen}
\begin{split}
\left(\mathbf{X} \sum_{c=0}^{C} \mathbf{M}_c \mathbf{X}^\top + \lambda \mathbf{I}\right) \mathbf{A} =\mathbf{X} \mathbf{H} \mathbf{X}^\top \mathbf{A} \Phi 
\end{split}
\end{equation}

上述表达式可以通过Matlab自带的$\mathrm{eigs()}$函数直接求解。$\mathbf{A}$就是我们要求解的变换矩阵。下面我们需要明确各个变量所指代的含义：

\begin{itemize}
	\item $\mathbf{X}$: 由源域和目标域数据共同构成的数据矩阵
	\item $C$: 总的类别个数。在我们的数据集中，$C=10$
	\item $\mathbf{M}_c$: MMD矩阵。当$c=0$时为全MMD矩阵；当$c>1$时对应为每个类别的矩阵。
	\item $\mathbf{I}$：单位矩阵
	\item $\lambda$：平衡参数，直接给出
	\item $\mathbf{H}$: 中心矩阵，直接计算得出
	\item $\Phi$: 拉格朗日因子，不用理会，求解用不到
\end{itemize}

\textbf{3. 编写代码}

我们参考JDA开源的代码，直接给出精炼后的源码：

\begin{lstlisting}[title=JDA方法的Matlab实现, frame=shadowbox]
function [acc,acc_ite,A] = MyJDA(X_src,Y_src,X_tar,Y_tar,options)
% This is the implementation of Joint Distribution Adaptation.
% Reference: Mingsheng Long et al. Transfer feature learning with joint distribution adaptation. ICCV 2013.

% Inputs:
%%% X_src          :     source feature matrix, ns * n_feature
%%% Y_src          :     source label vector, ns * 1
%%% X_tar          :     target feature matrix, nt * n_feature
%%% Y_tar          :     target label vector, nt * 1
%%% options        :     option struct
%%%%% lambda       :     regularization parameter
%%%%% dim          :     dimension after adaptation, dim <= n_feature
%%%%% kernel_tpye  :     kernel name, choose from 'primal' | 'linear' | 'rbf'
%%%%% gamma        :     bandwidth for rbf kernel, can be missed for other kernels
%%%%% T            :     n_iterations, T >= 1. T <= 10 is suffice

% Outputs:
%%% acc            :     final accuracy using knn, float
%%% acc_ite        :     list of all accuracies during iterations
%%% A              :     final adaptation matrix, (ns + nt) * (ns + nt)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% Set options
lambda = options.lambda;              
dim = options.dim;                    
kernel_type = options.kernel_type;    
gamma = options.gamma;               
T = options.T;                        

acc_ite = [];
Y_tar_pseudo = [];
%% Iteration
for i = 1 : T
[Z,A] = JDA_core(X_src,Y_src,X_tar,Y_tar_pseudo,options);
%normalization for better classification performance
Z = Z*diag(sparse(1./sqrt(sum(Z.^2))));
Zs = Z(:,1:size(X_src,1));
Zt = Z(:,size(X_src,1)+1:end);

knn_model = fitcknn(Zs',Y_src,'NumNeighbors',1);
Y_tar_pseudo = knn_model.predict(Zt');
acc = length(find(Y_tar_pseudo==Y_tar))/length(Y_tar); 
fprintf('JDA+NN=%0.4f\n',acc);
acc_ite = [acc_ite;acc];
end

end

function [Z,A] = JDA_core(X_src,Y_src,X_tar,Y_tar_pseudo,options)
%% Set options
lambda = options.lambda;              %% lambda for the regularization
dim = options.dim;                    %% dim is the dimension after adaptation, dim <= m
kernel_type = options.kernel_type;    %% kernel_type is the kernel name, primal|linear|rbf
gamma = options.gamma;                %% gamma is the bandwidth of rbf kernel

%% Construct MMD matrix
X = [X_src',X_tar'];
X = X*diag(sparse(1./sqrt(sum(X.^2))));
[m,n] = size(X);
ns = size(X_src,1);
nt = size(X_tar,1);
e = [1/ns*ones(ns,1);-1/nt*ones(nt,1)];
C = length(unique(Y_src));

%%% M0
M = e * e' * C;  %multiply C for better normalization

%%% Mc
N = 0;
if ~isempty(Y_tar_pseudo) && length(Y_tar_pseudo)==nt
for c = reshape(unique(Y_src),1,C)
e = zeros(n,1);
e(Y_src==c) = 1 / length(find(Y_src==c));
e(ns+find(Y_tar_pseudo==c)) = -1 / length(find(Y_tar_pseudo==c));
e(isinf(e)) = 0;
N = N + e*e';
end
end

M = M + N;
M = M / norm(M,'fro');

%% Centering matrix H
H = eye(n) - 1/n * ones(n,n);

%% Calculation
if strcmp(kernel_type,'primal')
[A,~] = eigs(X*M*X'+lambda*eye(m),X*H*X',dim,'SM');
Z = A'*X;
else
K = kernel_jda(kernel_type,X,[],gamma);
[A,~] = eigs(K*M*K'+lambda*eye(n),K*H*K',dim,'SM');
Z = A'*K;
end

end

% With Fast Computation of the RBF kernel matrix
% To speed up the computation, we exploit a decomposition of the Euclidean distance (norm)
%
% Inputs:
%       ker:    'linear','rbf','sam'
%       X:      data matrix (features * samples)
%       gamma:  bandwidth of the RBF/SAM kernel
% Output:
%       K: kernel matrix
%
% Gustavo Camps-Valls
% 2006(c)
% Jordi (jordi@uv.es), 2007
% 2007-11: if/then -> switch, and fixed RBF kernel
% Modified by Mingsheng Long
% 2013(c)
% Mingsheng Long (longmingsheng@gmail.com), 2013

function K = kernel_jda(ker,X,X2,gamma)

switch ker
case 'linear'

if isempty(X2)
K = X'*X;
else
K = X'*X2;
end

case 'rbf'

n1sq = sum(X.^2,1);
n1 = size(X,2);

if isempty(X2)
D = (ones(n1,1)*n1sq)' + ones(n1,1)*n1sq -2*X'*X;
else
n2sq = sum(X2.^2,1);
n2 = size(X2,2);
D = (ones(n2,1)*n1sq)' + ones(n1,1)*n2sq -2*X'*X2;
end
K = exp(-gamma*D); 

case 'sam'

if isempty(X2)
D = X'*X;
else
D = X'*X2;
end
K = exp(-gamma*acos(D).^2);

otherwise
error(['Unsupported kernel ' ker])
end
end

\end{lstlisting}

我们将JDA方法包装成函数$\mathrm{MyJDA}$。函数共接受5个输入参数：

\begin{itemize}
	\item $\mathrm{X_{src}}$: 源域的特征，大小为$n_s \times m$
	\item $\mathrm{Y_{src}}$: 源域的标注，大小为$n_s \times 1$
	\item $\mathrm{X_{tar}}$: 目标域的特征，大小为$n_t \times m$
	\item $\mathrm{Y_{tar}}$: 目标域的标注，大小为$n_t \times 1$
	\item $\mathrm{options}$: 参数结构体，它包含：
	\begin{itemize}
		\item $\lambda$c: 平衡参数，可以自由给出
		\item $T$: 算法迭代次数
		\item $dim$: 算法最终选择将数据将到多少维
		\item $kernel type$: 选择的核类型，可以选择RBF、线性、或无核
		\item $\gamma$: 如果选择RBF核，那么它的宽度为$\gamma$
	\end{itemize}
\end{itemize}

函数的输出包含3项：
\begin{itemize}
	\item $acc$: 算法的精度
	\item $acc_{iter}$: 算法每次迭代的精度，是一个一维数据
	\item $A$: 最终的变换矩阵
\end{itemize}

\textbf{4. 测试算法}

我们使用如下的代码对JDA算法进行测试：

\begin{lstlisting}
options.T = 10;             % #iterations, default=10
options.gamma = 2;          % the parameter for kernel
options.kernel_type = 'linear';
options.lambda = 1.0;
options.dim = 20;
[Acc,Acc_iter,A] = MyJDA(Xs,Ys,Xt,Yt,options);
disp(Acc);
\end{lstlisting}

结果显示如下：
\begin{lstlisting}
	Iteration [ 1]:BDA+NN=0.4499
	Iteration [ 2]:BDA+NN=0.4342
	Iteration [ 3]:BDA+NN=0.4395
	Iteration [ 4]:BDA+NN=0.4363
	Iteration [ 5]:BDA+NN=0.4395
	Iteration [ 6]:BDA+NN=0.4468
	Iteration [ 7]:BDA+NN=0.4457
	Iteration [ 8]:BDA+NN=0.4489
	Iteration [ 9]:BDA+NN=0.4509
	Iteration [10]:BDA+NN=0.4551
\end{lstlisting}

\textbf{5. 小结}

通过以上过程，我们使用Matlab代码对JDA方法进行了实验，完成了一个迁移学习任务。其他的非深度迁移学习方法，均可以参考上面的过程。值得庆幸的是，许多论文的作者都公布了他们的文章代码，以方便我们进行接下来的研究。读者可以从Github~\footnote{\url{https://github.com/jindongwang/transferlearning/tree/master/code}}或者相关作者的网站上获取其他许多方法的代码。

%\subsection{深度网络的finetune}
%
%\subsection{深度网络自适应}
%
%\subsection{深度对抗网络迁移}